Skip to content

Conversation

Half-Shot
Copy link
Member

Fixes #78

This adds a new endpoint POST /_matrix/media_proxy/unstable/scan_file which takes a multipart body of a file and JSON object for decrypting.

@Half-Shot Half-Shot requested a review from a team as a code owner July 1, 2025 13:55
Copy link
Contributor

@reivilibre reivilibre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks sensible overall


### `POST /_matrix/media_proxy/unstable/scan_file`

Performs a scan on a file body without uploading to Matrix. This request takes a multi-part / form data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just thinking this might be more easily recognisable as a literal content encoding name:

Suggested change
Performs a scan on a file body without uploading to Matrix. This request takes a multi-part / form data
Performs a scan on a file body without uploading to Matrix. This request takes a `multipart/form-data`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Content-Disposition#as_a_header_for_a_multipart_body is probably the best link I can find on MDN that describes how this is encoded. A bit weak but might be better than nothing

Comment on lines +140 to +141
| `body` | [Blob](https://developer.mozilla.org/en-US/docs/Web/API/Blob) | The file body. |
| `file` | EncryptedFile as JSON string | The metadata (decryption key) of an encrypted file. Follows the format of the `EncryptedFile` structure from the [Matrix specification](https://spec.matrix.org/v1.2/client-server-api/#extensions-to-mroommessage-msgtypes). Only required if the file is encrypted. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unimportant: I kinda find these param names a smidge confusing; why is the file not the file? ;p

Performs a scan on a file body without uploading to Matrix. This request takes a multi-part / form data
body.

Response format:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like the request format; meanwhile we don't specify the response format except through example (which is obvious enough that it's probably fine, don't get me wrong)

Comment on lines +340 to +342
removal_command_parts = self._removal_command.split()
removal_command_parts.append(file_path)
subprocess.run(removal_command_parts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be quite tempted to pull this out into its own function, to make it easier to modify the removal logic in future

(and if we're being pedantic, really this should be non-blocking/async, but I appreciate this is probably what was already here)


validate_encrypted_file_metadata(metadata)

# URL parameter is ignored.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really sure what this comment means

Comment on lines +506 to +511
with open(full_path, "wb") as fp:
while True:
chunk = await multipart.read_chunk()
if not chunk:
break
fp.write(chunk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would probably be better if it was async, but seems that we might have to pull in a library for this. I also guess you aren't the first to do it this way.
If you were interested though, https://pypi.org/project/aiofile/ looks reasonable

if not chunk:
break
fp.write(chunk)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this fails to write the file, should it delete the partial file?

Comment on lines +330 to +332
if metadata is not None:
with open(file_path, "rb") as f:
content = f.read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. is there any point in writing the encrypted file to disk, if we're just going to read it all back in to memory again?

If we can have clients send the encryption metadata as the first body part, that would mean we could do the right thing a little easier.

Although maybe it would be better if we could decrypt files without having them all in memory

That said: if this is all stuff that was a problem before you, it's also fine to leave it

str(uuid.uuid4()), media_content
)

# Remove source file now we've decrypted it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this also be done in a finally block if the file fails to decrypt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to scan a file before it's uploaded.
2 participants